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Abstract — Building on tlie recent work of Jolinson (2007) and 
Yu (2008), we prove that entropy is a concave function witli 
respect to the thinning operation Ta. That is, if X and Y 
are independent random variables on Z+ with ultra-log-concave 
probability mass functions, then 

H{Tc,X + Ti-c,Y) > aH{X) + (1 - a)H{Y), < a < 1, 

where H denotes the discrete entropy. This is a discrete analogue 
of the inequality (h denotes the differential entropy) 

h{y/aX + y/T~aY)>ah{X) + {l-a)h{Y), < a < 1, 

which holds for continuous X and Y with finite variances 
and is equivalent to Shannon's entropy power inequality. As a 
consequence we establish a special case of a conjecture of Shepp 
and OIkin (1981). Possible extensions are also discussed. 

Index Terms — binomial thinning; convolution; entropy power 
inequality; Poisson distribution; ultra-log-concavity. 

I. Introduction 

This paper considers information-theoretic properties of the 
thinning map, an operation on the space of discrete random 
variables, based on random summation. 

Definition 1 (Renyi, iflOl ): For a discrete random variable 
X on Z+ = {0, 1, . . .}, the thinning operation Ta is defined 
by 

X 

where Bi are (i) independent of each other and of X and 
(ii) identically distributed Bernoulli(a) random variables, i.e., 
Pr(Bj Pr(B, = 0) = a for each i. 

Equivalently, if the probability mass function (pmf) of X is 
/, then the pmf of T^X is 

where bi{i;j, a) = (■^)q;*(1 — a)-'"* is the binomial pmf. (Note 
that we write Ta for the map acting on the pmf as well as 
acting on the random variable.) 

We briefly mention other notation used in this paper We use 
Po(A) to denote the Poisson distribution with mean A, i.e., the 
pmf is po{X) — {po{i; A), i = 0, 1, . . .}, po{i; A) = A'e^'^/i!. 
The entropy of a discrete random variable X with pmf / is 
defined as 

H{X)^Hif) = J2-f,logf., 



and the relative entropy between X (with pmf /) and Y (with 
pmf g) is defined as 



D{X\\Y) ^ D{f\\g) 



i 

D{X\\po{X)) where A 



For convenience we write D{X) 
EX. 

The thinning operation is intimately associated with the 
Poisson distribution and Poisson convergence theorems. It 
plays a significant role in the derivation of a maximum entropy 
property for the Poisson distribution (Johnson |7|). Recently 
there has been evidence that, in a number of problems related 
to information theory, the operation Ta is the discrete counter- 
part of the operation of scaling a random variable by ^/a^, see 
HI, IS), Q, lfT4l . Since scaling arguments can give simple 
proofs of results such as the Entropy Power Inequality, we 
believe that improved understanding of the thinning operation 
could lead to discrete analogues of such results. 

For example, thinning lies at the heart of the following result 
(see f5|, 161, lfT4l ). which is a Poisson limit theorem with an 
information-theoretic interpretation. 

Theorem 1 (Law of Thin Numbers): Let / be a pmf on 
Z+ with mean A < oo. Denote by /*" the nth convolution 
of /, i.e., the pmf of J^^^i -^i where Xi are independent and 
identically distributed (i.i.d.) with pmf /. Then 

1) Ti/„(/*") converges point-wise to Po(A) as n — > oo; 

2) i/(Ti/„ (/*")) tends to H{po{X)) as n ^ oo; 

3) as n ^ oo, D(Ti/„(/*")) monotonically decreases to 
zero, if it is ever finite; 

4) if / is ultra-log-concave, then i/(Ti/„(/*")) increases 
in n. 

For Part (4), we recall that a random variable X on is 
called ultra-log-concave, or ULC, if its pmf / is such that the 
sequence ilfi, i = 0, 1, . . . , is log-concave. Examples of ULC 
random variables include the binomial and the Poisson. In 
general, a sum of independent (but not necessarily identically 
distributed) Bernoulli random variables is ULC. Informally, a 
ULC random variable is less "spread out" than a Poisson with 
the same mean. Note that in Part (4) the ULC assumption is 
natural since, among ULC distributions with a fixed mean, the 
Poisson achieves maximum entropy (fl\, fT4\). 

Parts (2) and (3) of Theorem [T] (see |5|, 16J) resemble 
the entropic central limit theorem of Barron ||2l, in that 
convergence in relative entropy, rather than the usual weak 



convergence, is established. The monotonicity statements in 
Parts (3) and (4), proved in fTTi, can be seen as the discrete 
analogue of the monotonicity of entropy in the central limit 
theorem, conjectured by Shannon and proved much later by 
Artstein et al. |T|. 

In this work we further explore the behavior of entropy 
under thinning. Our main result is the following concavity 
property. 

Theorem 2: If X and Y are independent random variables 
on Z+ with ultra-log-concave pmfs, then 

H{TaX+Tf3Y) > aH{X)+l3H{Y), a, f3>0, a+(3 < 1. 

(1) 

Theorem 12] is interesting on two accounts. Firstly, it can be 
seen as an analogue of the inequality 

h{y/^X + VT~^Y) > ah{X) + (1 - a)h{Y) (2) 

where X and Y are continuous random variables with finite 
variances and h denotes the differential entropy. The difference 
between thinning by a in ([TJ and scaling by ^/a in (|2]i is 
required to control different moments. In the discrete case, 
the law of small numbers jS] and the corresponding maximum 
entropy property |7| both require control of the mean, which 
is achieved by this thinning factor. In the continuous case, 
the central limit theorem ||2| requires control of the variance, 
which is achieved by this choice of scaling. It is well- 
known that (|2]l is a reformulation of Shannon's entropy power 
inequality ( lfT2l . |[3]). Thus Theorem |2] may be regarded as 
a first step towards a discrete entropy power inequality (see 
Section IV for further discussion). 

Secondly, Theorem |2] is closely related to an open problem 
of Shepp and OUcin [TTl concerning Bernoulli sums. With a 
slight abuse of notation let H{ai, . . . , an) denote the entropy 
of the sum Y^^=i where Xi is an independent Bernoulli 
random variable with parameter a^, i = 1, . . . , n. 

Conjecture 1 ([111): The function H{ai, . . . ,a„) is con- 
cave in (ai, . . . , a„), i.e., 

H {aai + (1 — a)bi, . . . , Q;a„ + (1 — q:)6„) 

> aH{ai,...,an) + il-a)H{bi,...,bn) (3) 

for all < a < 1 and a„ b, e [0, 1]. 

As noted by Shepp and OUcin ifTTl . iJ(ai, . . . , a„) is 
concave in each Ui and is concave in the special case where 
ai — . . . = ttn and bi = . . . — bn- We provide further 
evidence supporting Conjecture [T] by proving another special 
case, which is a consequence of Theorem |2] when applied to 
Bernoulli sums. 

Corollary 1: Relation ^ holds if mbi — for all i. 

Conjecture [1] remains open. We are hopeful, however, that 
the techniques introduced here could help resolve this long- 
standing problem. 

In Section II we collect some basic properties of thinning 
and ULC distributions, which are used in the proof of Theorem 
12] in Section III. Possible extensions are discussed in Section 
IV. 



II. Preliminary observations 
Basic properties of thinning include the semigroup relation 
(I7|) 

Tc.{Tpf)^T^pf (4) 
and the commuting relation (* denotes convolution) 

TM*9)^iTaf)*{Tc.9). (5) 

It is (JSj that allows us to deduce Corollary [T] from Theorem |2] 
easily. 

Concerning the ULC property, three important observations 
([7|) are 

1) a pmf / is ULC if and only if the ratio {i + l)fi+i/ fi 
is a decreasing function of i; 

2) if / is ULC, then so is T^f; 

3) if / and g are ULC, then so is their convolution f * g. 
A key tool for deriving Theorem |2] and related results (|7|, 

L13J ) is Chebyshev's rearrangement theorem, which states that 
the covariance of two increasing functions of the same random 
variable is non-negative. In other words, if X is a scalar 
random variable, and g and g are increasing functions, then 
(assuming the expectations are finite) 

EigiXygiX)] > Eg{X)Eg{X). 

III. Proof of Theorem|2] 
The basic idea is to use the decomposition 

H{X) = -D{X) - L{X) 

where as before D{X) = D{X\\po{\)) with A = EX, and 
L{X) = E\og{po(X-\)). 

The behavior of the relative entropy D{X) under thin- 
ning is fairly well-understood. In particular, by differentiating 
D{TaX) with respect to a and then using a data-processing 
argument, Yu [14 1 shows that 

D{TaX) < aD{X). (6) 

Further, for any independent U and V, the data-processing 
inequaUty shows that D{U + V) < D{U) + D{V). By taking 
U = TaX and V ~ Ti^qY, one concludes that 

D{T^X + < D{T^X) + D{Ti^^Y) 

< aD{X) + [1 - a)D{Y). 

Therefore we only need to prove the corresponding result 
for L, that is 

L{T^X + Ti^oX) < aL{X) + (1 - a)L{Y). (7) 

Unfortunately, matters are more complicated because there 
is no equivalent of the data-processing inequality, i.e., the 
inequaUty L{U + T^) < L{U) + L{V) does not always hold. 
(Consider for example U and V i.i.d. Bernoulli with parameter 
p E (0, 1). This inequality then reduces to 2p < p^, which 
clearly fails for all p.) 

Nevertheless, it is possible to establish Q directly. We 
illustrate the strategy with a related but simpler result, which 
involves the equivalent of Equation ^ for L. 



Proposition 1: For any pmf / on Z+ with mean A < oo, 
we have i?(T„/) > aH{f). 

Proof: Let us assume that the support of / is finite; the 
general case follows by a truncation argument (ImJ). In view 
of we only need to show l{a) < al{l), where 

l{a) = L(r„/) = ^(r„/), log ipo{i; a\)) . 

i>0 

By substituting f{a) = in Equation (8) of [7|, we obtain 
that 

d(T„/), ^ ^(^^/).- (z + l)(T„/),+i 
da a ' 

and hence, using summation by parts, 

d(T„/) 



where 



r(a) = Alog(aA)-^. 



logi! 



Alog(aA) - i V(z + l)(T„/),+i log {i + 1) . 

n/ — ^ 



i>0 



In a similar way, using the inequality log(l + ii) < u, u > —1, 



r'(a) 


A 
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1 




A 

> 

a 


1 
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1 




a 





i>0 



i>0 * 

5I(T^o/)*+2(* + 2)>0. 



The last inequality holds since X]^o ^C^aDs = Aa. 

Having established the convexity of /(a), we can now 
deduce the full Proposition using ■ 

Before proving Theorem IH we note that although ([T]i is 
stated for a + P < 1, only the case a + P = 1 need to be 
considered. Indeed, if ([T]) holds for a + f3 = 1, then for general 
a, (3 >0 such that a + (3 = < 1, we have 

H{T^X + TpY) = H{T^{T^,^X + T/j/^F)) 

> jH{T^/^X + Tp,^Y) (8) 

> aH{X)+l3H{Y), 

where (HI and (|5]) are used in the equality, and Proposition [T| 
is used in ([8]). 

Proof of Theorem^ Assume j3 — \ — a, and let / and g 
denote the pmfs of X and Y respectively. Assume A = EX > 
and fj, = EY > to avoid the trivial case. As noted before, 
we only need to show that 

K") = ^{TaJ * Tfig)^ logpo{i; aX + (3fj.) 

i>0 

is convex in a (where f3 = 1 — a). The calculations are similar 
to (but more involved than) those for Proposition [T] and we 
omit the details. The key is to express I" {a) in the following 
form suitable for applying Chebyshev's rearrangement theo- 
rem. 



A= ^ {Taf)i{Tf3g)jia{i,j), 

i>l,j>0 

^= E ^TJUTpg),jb{i,3), 

i>0,j>l 



and 

a{i,j) = 



i+j -I 

i + j — 1 aXi 



log———: 

i + j 

log 



(3^ {aX + (3fi)a'^f3'^ J " i + j 

Ultra-log-concavity and dominated convergence permit differ- 
entiating term-by-term. 

For each fixed j, since {i + j — 1) log((i + j — l)/(i + j)) 
decreases in i and log((« + j — + j)) increases in i, we 
know that a{i,j) decreases in i. Since T^/ is ULC, the ratio 
i{Taf)i/{Taf)i-i is decreasing in i. Hence we may apply 
Chebyshev's rearrangement theorem to the sum over i and 
obtain 



i{Taf)i 



^= E iTo.fh-i{T„g) 

i>l,j>Q 

>aX {Taf)i-i{Tpg)ja{i,j 
i>i,j>a 

= "A E (^"/)i(^/35)ia(« + I'i) 

i,j>0 



(9) 



Similarly, considering the sum over j, since b{i,j) is decreas- 
ing in j for any fixed i, 

B > /3m E iTcfUTpg),b{i,j + 1). (10) 
Adding up (|9|l and ( fTOl i. and noting that 



aXa{i + l,j) + (3^j.b{i,j + l) 



(A - m)^ 
aX + 



(i+j) log 



i + i 



we get 

I" [a) > 



aX + 
+ E (TafUTpg) 



i,j>0 



aA + p/i 1 + J + 1 



which is nonnegative, in view of the inequality u\og{u/{u + 
1)) > -1, u > 0. ■ 

IV. Towards a discrete Entropy Power Inequality 

In the continuous case, (|2|i is quickly shown (see |4|) to be 
equivalent to Shannon's entropy power inequality 

exp{2h{X + Y)) > exp{2h{X)) + exp{2h{Y)), (11) 

valid for independent X and Y with finite variances, with 
equality if and only if X and Y are normal. We aim to 
formulate a discrete analogue of ( fTTI ). with the Poisson dis- 
tribution playing the same role as the normal since it has 



the corresponding infinite divisibility and maximum entropy 
properties. 

Observe that the function exp(2t) appearing in (fTTl i is 
(proportional to) the inverse of the entropy of the normal 
with variance t. That is, if we write e{t) — h{N{0,t)) = 
log(-\/27rf) then the entropy power v{X) = e^^{h{X)) = 
exp{2h{X))/ {2tt), so Equation (fTTT l can be written as 

v{X + Y) > v{X)+v{Y). 

Although there does not exist a corresponding closed form 
expression for the entropy of a Poisson random variable, we 
can denote £{t) — H{po{t)). Then £{t) is increasing and 
concave. (The proof of Proposition [T] when speciaUzed to the 
Poisson case, implies this concavity.) Define 

V{X)=E-\H{X)). 

That is, H{po{V{X))) ~ H{X). It is tempting to conjecture 
that the natural discrete analogue of Equation (fTTT l is 

V{X + Y) >V{X) + V{Y), 

for independent discrete random variables X and Y , with 
equality if and only if X and Y are Poisson. However, this 
is not true. A counterexample, provided by an anonymous 
referee, is the case where X and Y both have the pmf 
p(0) = 1/6, p{l) = 2/3, p(2) ^ 1/6. Since this pmf even lies 
within the ULC class, the conjecture still fails when restricted 
to this class. 

We believe that the discrete counterpart of the entropy power 
inequality should involve the thinning operation described 
above. If so, the natural conjecture is the following, which 
we refer to as the thinned entropy power inequality. 

Conjecture 2: If X and Y are independent random vari- 
ables with ULC pmfs on Z+, then (0 < a < 1) 

V{To,X + Ti_„y) > aV{X) + (1 - a)V{Y). (12) 

In a similar way to the continuous case, (fT2l) easily yields 
the concavity of entropy. Equation ([T]), as a corollary. Indeed, 
by (fTZt and the concavity of £{t), we have 

H{T^X + Ti_„y) > £{aV{X) + (1 - a)V{Y)) 

>aE{y{X)) + {l-a)E{V{Y)) 
= aH{X) + (1 - a)H{Y) 

and ([U follows. 

Unlike the continuous case, ([T]i does not easily yield ( fT2] i. 
The key issue is the question of scaling. That is, in the continu- 
ous case, the entropy power v{X) satisfies v{^/aX) = av{X) 
for all a and X. It is this result that allows Dembo et al. ID 
to deduce ( fTTI ) from Q. 

Such an identity does not hold for thinned random variables. 
However, we conjecture that 

V{TaX) > aV{X) (13) 

for all a and ULC X. Note that this Equation ( HjI i, which 
we refer to as the restricted thinned entropy power inequality 
(RTEPI), is simply the case Y — of the full thinned entropy 



power inequality (fT2] |. If (fT3] l holds, we can use the argument 
provided by [4| to deduce the following result, which is in 
some sense close to the full thinned entropy power inequality, 
although /3 + 7 < 1 in general. 

Proposition 2: Consider independent ULC random vari- 
ables X and Y. For any /3, 7 e (0, 1) such that 

1 - 7 - V{X) - 7 ' 
if the RTEPI O holds then 

V{TfiX + T^Y) > f3V{X) + -fV{Y). 

Proof: Note that an equivalent formulation of the RTEPI 
O is that if X' is Poisson with H{X) = H{X') then for 
any a G (0, 1), H{TaX) > H{TaX'). Given X and Y we 
define X' and Y' to be Poisson with H{X) = H{X') and 
H{X) = H(Y'). 

Given (3 and 7, we pick a such that P < a and 7 < 1 — a 
so that: 

HiTpX + T^Y) 

= i/(r„(r/3/„x) + ri_„(r^/(i_„)y)) 

> aH{T^f^X) + {l-a)H{T^/^,_^)Y) (14) 

> + (1 - a)i?(T^/(i_„)r') (15) 
= a£{f3V{X)/a) + (1 - a)£{jViY)/{l - a)) 

where Equation (fl4l i follows by Theorem |2] and Equation (fTSl ) 
follows by the reformulated RTEPI. 
Now making the (optimal) choice 

a = pV{X)/{pV{X)+^V{Y)) 

this inequality becomes 

HiTpX + T^Y) > SiPViX) + -fViY)). 

The result follows by applying to both sides. Note that 
the restrictions on (3 and 7 are required to ensure (3 < a and 
7 < 1 — a. ■ 
Again assuming ( fTSI l, Proposition |2] yields the following 
special case of (fT2] i. The reason this argument works is that, 
as in PI, if X is Poisson then (fT3T i holds with equality for all 
a. 

Corollary 2: If RTEPI ^ holds then O holds in the 
special case where X is ULC and Y is Poisson with mean fi 
such that p. < V{X). 

Proof: For 7 e (0, 1) let Z be Poisson with mean /x(l — 
a)/"f. Then V{Z) = ^(1 - a)/j. The condition p. < V{X) 
ensures that we can choose 7 small enough such that 

a V{Z) ^ 1-a 

1-7 - v{x) - 7 

By Proposition ID 

V{T^X + T^Z) > aViX) + jViZ). 

The claim follows by noting that T^Z has the same Poisson 
distribution as Ti-^y. ■ 



We hope to report progress on ( fT2b in future work. Given the 
fundamental importance of ( fTTT ). it would also be interesting 
to see potential applications of ( fT2b (if true) and ([T]). For 
example, Oohama 191 used the entropy power inequality 
(fTTT i to solve the multi-terminal source coding problem. This 
showed the rate at which information could be transmitted 
from L sources, producing correlated Gaussian signals but 
unable to collaborate or communicate with each other, under 
the addition of Gaussian noise. It would be of interest to know 
whether ([12]) could lead to a corresponding result for discrete 
channels. 

Note: Since the submission of this paper to ISIT09, we 
have found a proof of the restricted thinned entropy power 
inequality, i.e.. Equation ( fTST l. The proof, based on |7|, is 
somewhat technical and will be presented in a future work. 
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